Fast kernel matrix-vector multiplication with application to Gaussian process learning
نویسنده
چکیده
A number of core computational problems in machine learning, both old and new, can be cast as a matrixvector multiplication between a kernel matrix or class-probability matrix and a vector of weights. This arises prominently, for example, in the kernel estimation methods of nonparametric statistics, many common probabilistic graphical models, and the more recent kernel machines. After highlighting the existence of this computational problem in several well-known machine learning methods, we focus on a solution for one specific example for clarity, Gaussian process (GP) prediction one whose applicability has been particularly hindered by this computational barrier. We demonstrate the application of a recent JV-body approach developed specifically for statistical problems, employing adaptive computational geometry and finite-difference approximation. This core algorithm reduces the O(N) matrix-vector multiplications within GP learning to O(N), making the resulting overall learning algorithm O(N). GP learning for N = 1 million points is demonstrated. 1 Kernel Matrix-Vector Multiplications in Learning A kernel matrix $ contains the kernel interaction of each point x^ in a query (test) dataset 2LQ (having size NQ) with each point from a reference (training) dataset X_n (having size iV^), where the kernel function K() often has some scale parameter a (the 'bandwidth'). Often the 'kernel function' is actually a probability density function, such as the Gaussian. In such cases $ is typically a class probability matrix. The query and reference set can be the same set. Often the core computational cost of a statistical method boils down to a multiplcation of this matrix $ with some vector of weights w. For example, in the weighted form of kernel density estimation, the density estimate at the q test point x_q is
منابع مشابه
Large-Scale Multiclass Transduction
We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its inverse with a vector can be computed sufficiently fast. This holds, for instance, for certain graph and string kernels. Transduction is achieved by variational inference over the unlabe...
متن کاملScalable Log Determinants for Gaussian Process Kernel Learning
For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an n× n positive definite matrix, and its derivatives – leading to prohibitive O(n) computations. We propose novel O(n) approaches to estimating these quantities from only fast matrix vector mu...
متن کاملImproved fast Gauss transform User manual
In most kernel based machine learning algorithms and non-parametric statistics the key computational task is to compute a linear combination of local kernel functions centered on the training data, i.e., f(x) = ∑N i=1 qik(x, xi), which is the discrete Gauss transform for the Gaussian kernel. f is the regression/classification function in case of regularized least squares, Gaussian process regre...
متن کاملA Parallel Tree Code for Computing Matrix-Vector Products with the Matérn Kernel
The Matérn kernel is one of the most widely used covariance kernels in Gaussian process modeling; however, large-scale computations have long been limited by the expensive dense covariance matrix calculations. As a sequel of our recent paper [Chen et al. 2012] that designed a tree code algorithm for efficiently performing the matrix-vector multiplications with the Matérn kernel, this paper docu...
متن کاملFast matrix-vector product based FGMRES for kernel machines
Algorithms based on kernel methods play a central role in statistical machine learning. At their core are a number of linear algebra operations on matrices of kernel functions which take as arguments the training and testing data. A kernel function Φ(xi, xj) generalizes the notion of the similarity between a test and training point. Given a set of data points, X = {x1, x2, . . . , xN}, xi ∈ R, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015